Introduction, Probability, & Distributions

Matt Talluto

21.11.2022

Course Introduction

Course Introduction

This course will give you:

Course Introduction

This course will give you:

And not so much:

Course Introduction

This course will give you:

And not so much:

Why Bayes?

Why statistics at all? What is the goal of statistical analysis?

Why Bayes?

Why statistics at all? What is the goal of statistical analysis?



What is the probability that my model is correct given what I already know about it and what I’ve learned?

Probabilistic partitions

Imagine a box with a total area of 1, representing all possible events

Probabilistic partitions

Probabilistic partitions

!B B
!A 1 - pr(A) - pr(B) pr(B)
A pr(A) 0

Probabilistic partitions

!B B
!A 1 - pr(A) - pr(B) + pr(A,B) pr(B) - pr(A,B)
A pr(A) - pr(A,B) pr(A,B)

Probabilistic partitions

Probabilistic partitions

Independence

Conditional probability

\[pr(A,B) = pr(A|B)pr(B)\]

Conditional probability

\[pr(A,B) = pr(A|B)pr(B)\]

Conditional probability

\[pr(A,B) = pr(A|B)pr(B)\]

So now that we have learned how to manupulate probabilities….

Can anyone define probability?

Manipulating conditional probabilities

Are you a (latent) zombie?

The problem:
0.1% of the population is infected with a parasite that will turn them into zombies. We have a test, but it is imperfect, with a false negative rate = 0.5% and a false positive rate = 1%.

You take the test, and the result is positive. What is the probability that you are actually going to become a zombie?

Manipulating conditional probabilities

Are you a (latent) zombie?

The problem:
0.1% of the population is infected with a parasite that will turn them into zombies. We have a test, but it is imperfect, with a false negative rate = 0.5% and a false positive rate = 1%.

You take the test, and the result is positive. What is the probability that you are actually going to become a zombie?

Hints

Detecting Zombies

Intuitively: the test is good, so the probability that a positive testing individual is a zombie should be high
(many people answer 99%, given the false positive rate of 1%).

Unintuitively: zombies are very rare, so when testing many people randomly, many tests will be false positives.

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Detecting Zombies — Contingency Table

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Test+ Test- Sum
Zombie
Not Zombie
Sum 1,000,000

Detecting Zombies — Contingency Table

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Test+ Test- Sum
Zombie 1,000
Not Zombie 999,000
Sum 1,000,000

Detecting Zombies — Contingency Table

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Test+ Test- Sum
Zombie 995 5 1,000
Not Zombie 999,000
Sum 1,000,000

Detecting Zombies — Contingency Table

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Test+ Test- Sum
Zombie 995 5 1,000
Not Zombie 9,990 989,010 999,000
Sum 10,985 989,015 1,000,000

Detecting Zombies — Contingency Table

The positive test is a given. This shrinks our world of possibilities * \(\frac{995}{10985}\) are zombies, or 9.06%

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Test+ Test- Sum
Zombie 995
Not Zombie 9,990
Sum 10,985

Detecting Zombies — Conditional Probabilities

0.1% of the population is infected with a parasite that will turn them into zombies.

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Detecting Zombies — Conditional Probabilities

false negative rate = 0.5%
false positive rate = 1%

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Given

Detecting Zombies — Conditional Probabilities

\(pr(T | Z) = 1 - pr(T' | Z) = 1 - 0.005 = 0.995\)

\(pr(T' | Z') = 1 - pr(T | Z') = 1 - 0.01 = 0.99\)

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Given

Detecting Zombies — Conditional Probabilities

\(pr(Z,T) = pr(T|Z)pr(Z) = 0.995 \times 0.001 = 0.000995\)

\[pr(Z|T) = \frac{pr(T|Z)pr(Z)}{pr(T)}\]

(Bayes’ Theorem)

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Given

Known

Detecting Zombies — Bayes’ Theorem

\[pr(Z|T) = \frac{pr(T|Z)pr(Z)}{pr(T)}\]

\[ \begin{aligned} pr(T) & = pr(T,Z) + pr(T,Z') \\ & = pr(T|Z)pr(Z) + pr(T|Z')pr(Z') \\ & = 0.995 \times 0.001 + 0.01 \times 0.999 \\ & = 0.000995 + 0.000999 \\ & = 0.010985 \end{aligned} \]

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Given

Known

Detecting Zombies — Bayes’ Theorem

\[ \begin{aligned} pr(Z|T) & = \frac{pr(T|Z)pr(Z)}{pr(T)} \\ & = \frac{0.995 \times 0.001}{0.010985} \\ & = 0.0906 \end{aligned} \]

Desired outcome: \(pr(Z | T)\)
(if I test positive, what is the probability I am a zombie?)

Given

Known

Signal detection problems

The zombie example is cute, but it is a real biological problem. “True” state is often hidden, we have an imperfect signal.

Signal detection problems

Observed Not Observed
Present
Absent

Signal detection problems

Observed Not Observed
Present True positive
Absent

Signal detection problems

Observed Not Observed
Present True positive
Absent False positive

Signal detection problems

Observed Not Observed
Present True positive False negative
Absent False positive

Signal detection problems

Observed Not Observed
Present True positive False negative
Absent False positive True negative

Probability Concepts/Rules

Probability Concepts/Rules

Product rule => Chain rule

\[ \begin{aligned} pr(A,B) & = pr(A|B)pr(B) \\ \end{aligned} \]

Probability Concepts/Rules

Product rule => Chain rule

\[ \begin{aligned} pr(A,B) & = pr(A|B)pr(B) \\ \end{aligned} \]

\[ \begin{aligned} pr(A,B,C) & = pr(A|B,C)pr(B,C) \\ & = pr(A|B,C)pr(B|C)pr(C) \end{aligned} \]

Probability Concepts/Rules

Product rule => Chain rule

\[ \begin{aligned} pr(A,B) & = pr(A|B)pr(B) \\ \end{aligned} \]

\[ \begin{aligned} pr(A,B,C) & = pr(A|B,C)pr(B,C) \\ & = pr(A|B,C)pr(B|C)pr(C) \end{aligned} \]

\[ \begin{aligned} pr(\bigcap_{k=1}^{n} A_k) & = pr(A_n | \bigcap_{k=1}^{n-1} A_k )pr(\bigcap_{k=1}^{n-1} A_k) \\ & =\prod_{k=1}^{n}pr(A_k | \bigcap_{j=1}^{k-1}A_j) \end{aligned} \]

Probability Concepts/Rules

Probability Concepts/Rules


What if zombies are common?

The zombie distribution

The zombie distribution

\[\begin{align} pr(k = 0 | n = 10, p = 0.3) & = (0.7 \times \ldots 0.7) \\ & = 0.7^{10} \\ & \approx 0.028 \\ \end{align}\]

\[pr(k = 0 | n = 10, p = 0.3) = 0.3^{10} \approx 0.000 \]

The zombie distribution

\[pr(Z_1,Z'_{2..10}) = 0.3 \times0.7^9 \approx 0.012 \]

\[pr(k=1|n=10,p=0.3) = 10 \times 0.3 \times0.7^9 \approx 0.121\]

The zombie binomial distribution

\[pr(Z_{a}, Z'_{a'}) = p^k(1 - p)^{(n - k)}\]

\[pr(k|n,p) = {n \choose k} p^k(1-p)^{(n-k)}\]

choose(n = 10, k = 0:10)
##  [1]   1  10  45 120 210 252 210 120  45  10   1
round(dbinom(0:10, 10, 0.3), 3)
##  [1] 0.028 0.121 0.233 0.267 0.200 0.103 0.037 0.009 0.001 0.000 0.000

Binomial distribution

Binomial distribution

\[ pr(X \le k|n,p) = \sum_{i=0}^{k} {n \choose i}p^i(1-p)^{(n-i)} \]

k = 0:10
y = pbinom(k, 10, 0.3)
round(y, 3)
##  [1] 0.028 0.149 0.383 0.650 0.850 0.953 0.989 0.998 1.000 1.000 1.000
round(sum(dbinom(0:2,10,0.3)), 3)
## [1] 0.383

Poisson distribution

lam = 5
pois_dat = data.frame(x = 0:20)
pois_dat$pmf = dpois(pois_dat$x, lam)
pois_dat$cdf = ppois(pois_dat$x, lam)

Poisson distribution

lam = c(0.5, 2, 5, 20)
pois_dat = expand.grid(x=0:50, lam=lam)
pois_dat$pmf = dpois(pois_dat$x, pois_dat$lam)
pois_dat$cdf = ppois(pois_dat$x, pois_dat$lam)

Negative binomial distribution

\[\mu = \frac{pr}{1-p}\] \[ s^2 = \mu + \frac{\mu^2}{r} \]

dat = expand.grid(x = 0:60, mu = c(10,20), size = c(5, 2))
dat$pmf = with(dat, dnbinom(x, mu=mu, size=size))
dat$cdf = with(dat, pnbinom(x, mu=mu, size=size))

Exponential distribution

lam = c(0.5, 2, 5, 20)
dat = expand.grid(x=seq(0,15, length.out=100), lam=lam)
dat$pdf = dexp(dat$x, dat$lam)
dat$cdf = pexp(dat$x, dat$lam)

Gamma distribution



dat = expand.grid(x=seq(0,15, length.out=100), shape=c(0.5, 4), rate = c(0.2, 2))
dat$pdf = with(dat, dgamma(x, shape=shape, rate = rate))
dat$cdf = with(dat, pgamma(x, shape=shape, rate = rate))

Normal distribution

dat = expand.grid(x=seq(-6,6, length.out=100), mu=0, sd = c(0.2, 1, 2))
dat$pdf = with(dat, dnorm(x, mu, sd))
dat$cdf = with(dat, pnorm(x, mu, sd))

Beta distribution

dat = expand.grid(x=seq(0,1, length.out=100), alpha=c(0.5, 1, 2), beta = c(0.5, 1, 2))
dat$pdf = with(dat, dbeta(x, alpha, beta))
dat$cdf = with(dat, pbeta(x, alpha, beta))

Distribution functions